Analogy Training Multilingual Encoders
نویسندگان
چکیده
Language encoders encode words and phrases in ways that capture their local semantic relatedness, but are known to be globally inconsistent. Global inconsistency can seemingly corrected for, part, by leveraging signals from knowledge bases, previous results partial limited monolingual English encoders. We extract a large-scale multilingual, multi-word analogy dataset Wikidata for diagnosing correcting global inconsistencies, then implement four-way Siamese BERT architecture grounding multilingual (mBERT) through training. show training not only improves the consistency of mBERT, as well isomorphism language-specific subspaces, also leads consistent gains on downstream tasks such bilingual dictionary induction sentence retrieval.
منابع مشابه
Gradual training of deep denoising auto encoders
Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small ...
متن کاملGradual Training Method for Denoising Auto Encoders
Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small ...
متن کاملTraining Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables
Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-bas...
متن کاملIs Joint Training Better for Deep Auto-Encoders?
Traditionally, when generative models of data are developed via deep architectures, greedy layer-wise pre-training is employed. In a well-trained model, the lower layer of the architecture models the data distribution conditional upon the hidden variables, while the higher layers model the hidden distribution prior. But due to the greedy scheme of the layerwise training technique, the parameter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i14.17524